Extracting English-Korean Transliteration Equivalence from Domain-Specific Dictionaries
نویسندگان
چکیده
Automatic translation knowledge acquisition or automatic bilingual dictionary construction has become an important first step for natural language applications such as machine translation and cross-language information retrieval. Transliterations are used to translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from English to Korean, Japanese, and Chinese. Transliteration equivalence refers to a set composed of one foreign word and its possible transliterations. Transliterations are one of the main sources of the out-ofvocabulary (OOV) problem, because transliteration is a productive process. Many Korean domain-specific terms are composed of transliterations. Therefore, translation knowledge on transliteration equivalence is important for natural language applications to process domain-specific texts. In this paper, we propose an algorithm recognizing transliteration equivalence or transliteration pairs in domain-specific dictionaries using machine transliteration. Our method shows about 99% precision and 73% recall rate.
منابع مشابه
Extracting Transliteration Pairs from Comparable Corpora
Transliterating words and names from one language to another is a frequent and highly productive phenomenon. For example, English word cache is transliterated in Japanese asキャッシュ “kyasshu”. In many cases, recent transliterations are not recorded in machine readable dictionaries so it is impossible to rely on dictionary lookup to find transliteration equivalents. In this paper we describe a meth...
متن کاملEnglish-Korean Patent Translation System: FromTo-EK/PAT
This paper addresses a method for customizing an English-Korean machine translation system from general domain to patent domain. The customizing method includes the followings: (1) extracting and constructing large bilingual terminology and the patent-specific translation patterns, (2) adapting the probabilities of POS tagger trained from general domain to the patent domain, (3) syntactically a...
متن کاملKorean-Chinese Cross-Language Information Retrieval Based on Extension of Dictionaries and Transliteration
This paper describes our Korean-Chinese cross-language information retrieval system. Our system uses a bi-lingual dictionary to perform query translation. We expand our bilingual dictionary by extracting words and their translations from the Wikipedia site, an online encyclopedia. To resolve the problem of translating Western people’s names into Chinese, we propose a transliteration mapping met...
متن کاملEquivalence in Technical Texts: The Case of Accounting Terms in English-Persian Dictionaries
Translating accounting documents, in general, and accounting terminology, in particular, is not a simple task, especially when the new terms keep created in pace with accounting developments. This study was carried out to find the most common and preferable ways to translate accounting terms from English into Persian. Also, an attempt was made to identify the frequently used patterns of word-fo...
متن کاملEnglish-to-Korean Transliteration using Multiple Unbounded Overlapping Phoneme Chunks
We present in this paper the method of English-to-Korean(E-K) transliteration and back-transliteration. In Korean technical documents, many English words are transliterated into Korean words in various forms in diverse ways. As English words and Korean transliterations are usually technical terms and proper nouns, it is hard to nd a transliteration and its variations in a dictionary. Therefore ...
متن کامل